Search CORE

289 research outputs found

Optimal-Hash Exact String Matching Algorithms

Author: Lecroq Thierry
Publication venue
Publication date: 10/03/2023
Field of study

String matching is the problem of finding all the occurrences of a pattern in a text. We propose improved versions of the fast family of string matching algorithms based on hashing

q

-grams. The improvement consists of considering minimal values

q

such that each

q

-grams of the pattern has a unique hash value. The new algorithms are fastest than algorithm of the HASH family for short patterns on large size alphabets.Comment: 14 page

arXiv.org e-Print Archive

A fast implementation of the Boyer-Moore string matching algorithm

Author: Crochemore Maxime
Lecroq Thierry
Publication venue: HAL CCSD
Publication date: 01/01/2007
Field of study

Manuscript, http://www-igm.univ-mlv.fr/~lecroq/articles/cl2008.pd

HAL - Normandie Université

Hal-Diderot

HAL-Ecole des Ponts ParisTech

HAL - UPEC / UPEM

Efficient Pattern Matching on Binary Strings

Author: Faro Simone
Lecroq Thierry
Publication venue
Publication date: 01/01/2008
Field of study

The binary string matching problem consists in finding all the occurrences of a pattern in a text where both strings are built on a binary alphabet. This is an interesting problem in computer science, since binary data are omnipresent in telecom and computer network applications. Moreover the problem finds applications also in the field of image processing and in pattern matching on compressed texts. Recently it has been shown that adaptations of classical exact string matching algorithms are not very efficient on binary data. In this paper we present two efficient algorithms for the problem adapted to completely avoid any reference to bits allowing to process pattern and text byte by byte. Experimental results show that the new algorithms outperform existing solutions in most cases.Comment: 12 page

arXiv.org e-Print Archive

HAL - Normandie Université

CiteSeerX

Algorithms for Computing Abelian Periods of Words

Author: Fici Gabriele
Lecroq Thierry
Lefebvre Arnaud
Prieur-Gaston Elise
Publication venue: 'Elsevier BV'
Publication date: 10/06/2013
Field of study

Constantinescu and Ilie (Bulletin EATCS 89, 167--170, 2006) introduced the notion of an \emph{Abelian period} of a word. A word of length

n

over an alphabet of size

\sigma

can have

\Theta(n^{2})

distinct Abelian periods. The Brute-Force algorithm computes all the Abelian periods of a word in time

O(n^2 \times \sigma)

using

O(n \times \sigma)

space. We present an off-line algorithm based on a \sel function having the same worst-case theoretical complexity as the Brute-Force one, but outperforming it in practice. We then present on-line algorithms that also enable to compute all the Abelian periods of all the prefixes of

w

.Comment: Accepted for publication in Discrete Applied Mathematic

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università di Palermo

Fast Computation of Abelian Runs

Author: Fici Gabriele
Kociumaka Tomasz
Lecroq Thierry
Lefebvre Arnaud
Prieur-Gaston Elise
Publication venue: 'Elsevier BV'
Publication date: 22/12/2015
Field of study

Given a word

w

and a Parikh vector

\mathcal{P}

, an abelian run of period

\mathcal{P}

w

is a maximal occurrence of a substring of

w

having abelian period

\mathcal{P}

. Our main result is an online algorithm that, given a word

w

of length

n

over an alphabet of cardinality

\sigma

and a Parikh vector

\mathcal{P}

, returns all the abelian runs of period

\mathcal{P}

w

in time

O(n)

and space

O(\sigma+p)

, where

p

is the norm of

\mathcal{P}

, i.e., the sum of its components. We also present an online algorithm that computes all the abelian runs with periods of norm

p

w

in time

O(np)

, for any given norm

p

. Finally, we give an

O(n^2)

-time offline randomized algorithm for computing all the abelian runs of

w

. Its deterministic counterpart runs in

O(n^2\log\sigma)

time.Comment: To appear in Theoretical Computer Scienc

arXiv.org e-Print Archive

HAL - Normandie Université

A Note on Easy and Efficient Computation of Full Abelian Periods of a Word

Author: Fici Gabriele
Lecroq Thierry
Lefebvre Arnaud
Prieur-Gaston Élise
Smyth William F.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2015
Field of study

Constantinescu and Ilie (Bulletin of the EATCS 89, 167-170, 2006) introduced the idea of an Abelian period with head and tail of a finite word. An Abelian period is called full if both the head and the tail are empty. We present a simple and easy-to-implement

O(n\log\log n)

-time algorithm for computing all the full Abelian periods of a word of length

n

over a constant-size alphabet. Experiments show that our algorithm significantly outperforms the

O(n)

algorithm proposed by Kociumaka et al. (Proc. of STACS, 245-256, 2013) for the same problem.Comment: Accepted for publication in Discrete Applied Mathematic

arXiv.org e-Print Archive

Research Repository

Archivio istituzionale della ricerca - Università di Palermo

Efficient validation and construction of border arrays

Author: Arnaud Lefebvre
Jean-pierre Duval
Thierry Lecroq
Publication venue
Publication date: 01/01/2006
Field of study

In this article we present an on-line linear time and space algorithm to check if an integer array f is the border array of at least one string w built on a bounded or unbounded size alphabet Σ. We first show some relations between the border array of some string w and the skeleton of the DFA recognizing Σ ∗ · w, independently of the explicit knowledge of w. This enables us to design algorithms for validating and generating border arrays that outperform existing ones [4, 3]. The validating algorithm lowers the delay (time spent on one element of the array) from O(|w|) to O(min{|Σ|, |w|}) comparing to algorithms in [4, 3]. Finally we give some results on the numbers of distinct border arrays on some alphabet sizes.

HAL - Normandie Université

CiteSeerX